Complete genome sequence data of Priestia megaterium strain MARUCO02 isolated from marine mangrove-inhabited sediments of the Indian Ocean in the Bagamoyo Coast

Priestia is a genus of biotechnologically important bacteria adapted to thrive in a wide range of environmental conditions including the marine sediments. Here, we screened and isolated a strain from the Bagamoyo marine mangrove-inhabited sediments and then employed whole genome sequencing to recover and define its full genome. De novo-assembly with Unicycler (v. 0.4.8) and annotation with Prokaryotic Genome Annotation Pipeline (PGAP) revealed that that its genome contains one chromosome (5,549,131 bp), with a GC content of 37.62%. Further analysis showed that the genome contains 5,687 coding sequences (CDS), 4 rRNAs, 84 tRNAs, 12 ncRNAs, and at least 2 plasmids (1,142 bp and 6,490 bp). On the other hand, antiSMASH-based secondary metabolite analysis revealed that the novel strain (MARUCO02) contains gene clusters for biosynthesis of MEP-DOXP-dependent versatile isoprenoids (eg. carotenoids), siderophores (synechobactin and schizokinen) and polyhydroxyalkanoates (PHA). The genome dataset also informs about the presence genes encoding enzymes required for generation of hopanoids, compounds that confer adaption to harsh environmental conditions including industrial cultivation recipes. Our data from this novel Priestia megaterium strain MARUCO02 can be used for reference and in genome-guided selection of strains for production of isoprenoids as well as industrially useful siderophores and polymers, amenable for biosynthetic manipulations in a biotechnological process.


Keywords:
Priestia megaterium Genome sequence Phylogenomics Biosynthetic gene clusters Carotenoids Polyhydroxyalkanoates MARUCO02 sequencing to recover and define its full genome. De novo -assembly with Unicycler (v. 0.4.8) and annotation with Prokaryotic Genome Annotation Pipeline (PGAP) revealed that that its genome contains one chromosome (5,549,131 bp), with a GC content of 37.62%. Further analysis showed that the genome contains 5,687 coding sequences (CDS), 4 rRNAs, 84 tRNAs, 12 ncRNAs, and at least 2 plasmids (1,142 bp and 6,490 bp). On the other hand, antiSMASH-based secondary metabolite analysis revealed that the novel strain (MARUCO02) contains gene clusters for biosynthesis of MEP-DOXP-dependent versatile isoprenoids (eg. carotenoids), siderophores (synechobactin and schizokinen) and polyhydroxyalkanoates (PHA). The genome dataset also informs about the presence genes encoding enzymes required for generation of hopanoids, compounds that confer adaption to harsh environmental conditions including industrial cultivation recipes. Our data from this novel Priestia megaterium strain MARUCO02 can be used for reference and in genome-guided selection of strains for production of isoprenoids as well as industrially useful siderophores and polymers, amenable for biosynthetic manipulations in a biotechnological process.

Value of the Data
• The genome data for Priestia megaterium strain MARUCO02 could present a potential strain for study of industrial production of enzymes, siderophores, polymers and isoprenoid compounds such as carotenoids. • The genome data can benefit scientific innovation in the laboratory and industrial setting.
• By means of both raw and analyzed datasets, the genome dataset possesses value for comparative genomic studies characterizing marine plant-associated and growth promoting Priestia species.

Objective
Priestia megaterium has emerged as a bacterial species with application in the biotechnological industry, with its role as source of enzymes, vitamins, pigments and polymers. This study sought to recover novel strains of Priestia megaterium from local marine ecosystems for biotechnological applications. We thus aimed at uncovering the genome-guided biotechnological relevance of Priestia megaterium strain MARUCO02 isolated from Bagamoyo Tanzania.

Data Description
The dataset in the current article describes the genomic features underlying the biotechnological potential of Priestia megaterium strain MARUCO02 as a source of various high value bioproducts. A total of 16,631,008 paired end reads (raw data) were generated, which upon filtration were reduced to 10,429,051. De novo assembly resulted in 39 contigs, with N50 of 4283054, and a total of 5,614,752 bp. Shown with genome features in Table 1 , the resolved chromosome has the size of 5,549,131 bp with a total of 5.592 coding sequences (CDS) and 88 RNAs. Table 2 describes the average nucleotide identity (ANI) values showing the possible closest relatives of the MARUCO02 strain, useful for future comparative studies.   Fig. 1 represents the phylogenomic position (a) of Priestia megaterium strain MARUCO02 based on whole whole-proteome GBDP distances generated using TYGS [1] and the conserved DHHA1 domain-containing protein phylogeny (b) from maximum likelihood inferred based on the LG substitution model using IQ-TREE [2] . From various previous historical studies Bacillus and Priestia were placed together in the same genus as Bacillus . Until recently, genomic studies have unanimously broken the genus Bacillus into multiple taxonomic groups, guiding the placement Priestia into a separate genus [3][4][5] . This genome dataset places our strain in the genus Priestia and species Priestia megaterium .
To delineate the biosynthetic potential for secondary metabolites with antiSMASH the genome was broken into seven regions (Supplementary file 2). Of all the seven regions, only two (region 1 and region 7) significantly matched known biosynthetic clusters. Region 1 carries three significant cluster hits, i.e. BGC0 0 02470, BGC0 0 02633 and BGC0 0 02683, which correspond to structurally related siderophores namely synechobactin and schizokinen ( Fig. 2 (a)). Region 7 represents one significant hit cluster BGC0 0 0 0645, which belongs to the carotenoid class of compounds.
Whereas antiSMASH does not predict structural assembly of respective compounds, PRISM commonly does. However, in this work, although PRISM could not resolve any structure, we were able to confirm one cluster, the siderophore, with the core open reading frames (ORFs), supporting the BGCs predicted from antiSMASH. Fig. 2 (b) represents the gene cluster for biosynthesis of polyhydroxyalkanoates (PHAs). The dataset shows the gene cluster and enzymes that responsible for catalytic steps from initial reactions to assembly of the monomers into polymers. Fig. 3 represents the non-mevalonic acid pathway, also known as the 2-C-methyl-D-erythritol 4-phosphate/1-deoxy-D-xylulose 5-phosphate (MEP/DOXP) pathway for isoprenoid biosynthesis, predicted form antiSMASH and then recovered via manual curation of the PGAP genome annotation dataset. Chemical structures were drawn using Chemdraw 8.0 [6] , along with all enzymes catalyzing important reactions.

Taxonomy and phylogenetic placement
We combined PATRIC, TYGS and PGAP annotation results to infer the taxonomic and phylogenetic position of the MARUCO02 strain. From PATRIC, the genome was annotated using the RAST tool kit (RASTtk) [7] . The annotation results ( Table 2 ) confirmed that strain belongs to the genus Priestia. The closest reference and representative genomes to were identified by Mash/MinHash [8] . The closest relatives belong to the species Priestia megaterium and Priestia aryabhattai. However, detailed analysis indicates P. aryabhattai still matches with P. megaterium with average nucleotide identity (ANI) of above the threshold (95%). Thus the two species are suggestively a single species with variable strains. Genomes of Priestia species possess unique oligoribonuclease NrnB or cAMP/cGMP phosphodiesterase and DHH/DHHA1 superfamily protein, which have been utilized as reliable molecular signatures distinguishing the genus from the rest [4] . We manually searched the protein sequences from the PGAP-recovered proteome and hereby confirm the presence of these genes (UYP05222.1, UYP05090.1, UYP08660.1 and UYP10200.1) in our strain. Guided by the recent taxonomic demarcation by Gupta et al. [4] , we aligned the UYP10200.1 protein with those from Priestia and Bacillus clades, from which the phylogenetic tree generated by IQ-TREE confirmed that the MARUCO02 falls under the genus Priestia , together with P. megaterium and P. aryabhattai , among other strains ( Fig. 1 (b)).
Based on ANI, TYGS and IQ-TREE phylogenies, we unanimously named this species Priestia megaterium . In all the data the ''MARUCO2'' strain identifier signifies isolation and handling by Marian University College (MARUCO) as a second isolate (02) from our project.

Biosynthesis of Secondary Metabolites of Interest
Analysis with PRISM recovered only one cluster, which was identified as a siderophore biosynthetic cluster. The cluster contains three core open reading frames (ORFs), identified as   Fig 2. (a) Structures of siderophores predicted from antiSMASH analysis and drawn with Chemdraw, i.e. synechobactin and schizokinen. The two structures were drawn using Chemdraw. b) The PHA biosynthetic gene cluster decoded from genome annotation of P. megaterium strain MARUCO02. The cluster comprises of two regulatory genes (PhaP and PhaQ) and structural genes (PhaR, PhaB and PhaC) responsible for PHA polymer assembly. the iron-binding IucA/IucC family siderophore biosynthesis protein (WP_182005752.1), thymidylate synthase (WP_182005751.1), and dihydrofolate reductase (WP_182005752.1). From PGAP annotation and BLASTp corresponding output, the BGC could not reveal a clear compound from PRISM until the antiSMASH data were comparatively used (Supplementary file 2). From anti-SMASH prediction, there are three possible BGCs for two closely related siderophores, namely synechobactin and schizokinen ( Fig. 2 a). The latter was described for the first time in Priestia megaterium in early 1970s as an iron-transporting molecule [11] . While most of the siderophores are known to be synthesized via the nonribosomal peptide synthetase (NRPS) pathway, synechobactin and schizokinen are generated by a NRPS-independent siderophore synthetase (NIS) pathway and are well characterized in Cyanobacteria [12] . Here, through the combination of PRISM and antiSMASH BGC prediction, we highlight the potential engagement of Priestia megaterium MARUCO02 in the biosynthesis of these siderophores, useful in bioremediation as well as medicine [12] .
In addition, the MARUCO02 genome contains BGCs for polyhydroxyalkanoate synthesis ( Fig. 2 b) as well as possible degradative genes. The gene cluster for PHA has been well characterized in Priestia megaterium about two decades ago [13 , 14] . The cluster consists of an operon with PhaP, PhaQ, PhaR, PhaB and PhaC . Although the cluster could not be unraveled with antiSMASH or PRISM, we manually searched each candidate gene from the proteome recovered from PGAP annotation, and we hereby present them in Fig. 2 (b). While the two upstream genes PhaP , and PhaP comprise a regulatory unit, the three PhaR, PhaB and PhaC are responsible for generation of PHA units and their polymerization to complete PHA molecules [14] . Interestingly, we were also able to identify the polyhydroxybutyrate (PHB) depolymerase gene from PGAP annotation (NCBI accession UYP05899.1). The enzyme PHB depolymerase (EC 3.1.1.75) is of interest in biodegradation research [15 , 16] , thus the MARUCO02 genome is suggestive of the potential as factory for enzymes required in biodegrading of polymers including plastics.
In the biosynthesis of terpenoids, the MARUCO02 genome possesses genes for MEP DOXP pathway responsible for biosynthesis of an array of carotenoids. Primarily recovered from anti-SMASH known cluster blast algorithm [17] , the genome was found to exhibit up to 50% similarity with genomes involved in the biosynthesis of carotenoids. Our downstream analysis confirmed the methylerythritol 4-phosphate (MEP)/1-deoxy-D: -xylulose-5-phosphate (DOXP) (DOXP/MEP) pathway with possible versatilities of carotenoids and hopanoids ( Fig. 3 ). Described in multiple reports [18][19][20] , the MEP/DOXP, also known as the non-mevalonate pathway, is responsible for the biosynthesis of monoterpenes of essential oils, linalyl acetate, several forms of sesquiterpenes, diterpenes, phytol as well as carotenoids. From glycolysis, the enzyme 1-deoxy-D-xylulose 5-phosphate synthase (DXS) condenses a pyruvate molecule with glyceraldehyde-3-phosphate (GA-3P) to form DOXP, which then is reduced to MEP by DOXP reductoisomerase (DXR). The most important rate determining steps include the DXS, squalene synthase (SQS), phytoene synthase (CrtM, PSY) ( Fig. 3 ). The carotenoid biosynthetic pathway for P. megaterium strain MARUCO02 genome. The pathway was reconstructed by combining antiSMASH BGC analysis with manual protein functional annotation and confirmation with databases (NCBI (https://blast.ncbi.nlm.nih.gov/Blastp) and UniProt (https://www.uniprot.org/blast)). Genes encoding the respective enzymes are represented in black bolded letters together with the accession numbers (blue bolded) of individual protein sequences manually sorted from PGAP annotation.

Strain Isolation and DNA Extraction
Samples were obtained from sediments inhabited by mangrove trees in the Indian Ocean in the vicinity of the sea shore (https://www.google.com/maps/@-6.424511,38.901958,14z) of the Bagamoyo Coast in Tanzania. Three sediment samples were collected using sterile plastic bottles and stored at 4 °C in the laboratory. For bacterial isolation a proportion of the sediment (approx 5 g) was dissolved in 200 ml of 0.80% of NaCl followed by serial dilutions (10 −1 to 10 −6 ) with phosphate buffered saline (PBS) (pH 7.2) and isolation by streaking on nutrient agar (NA) culture at 28 °C for 48 hours. One of the sample colonies was chosen for DNA extraction for identification. Total DNA was extracted using a ZymoBIOMICS DNA Miniprep Kit (ZR D4300), based on the manufacturer's guide.

Taxonomic Placement with Phylogenomic and Phylogenetic Analyses
For whole proteome-based phylogenomic analysis, the genome sequence was uploaded to the Type (Strain) Genome Server (TYGS), a free bioinformatics platform available at https: //tygs.dsmz.de [1] . Guided by previous genus delineation [4] , the DHHA1, a conserved molecular marker, was chosen and aligned using MAFFT(v7.487) [24] (Supplementary file 1) with those of reference Priestia and Bacillus strains from the model study by Gupta and colleagues [4] . Phylogenetic inference was deduced from the LG substitution model, with maximum likelihood and Bayesian estimation methods using IQ-TREE (v.16.12) [25] for 10 0 0 replicates.

Analysis of Biosynthetic Gene Clusters and Pathway Elucidation
To predict the clusters and possible structural assembly of secondary metabolites, the genome sequence was scanned with PRISM 4 [26] . Alternatively, the genome was analyzed with anti-SMASH (v6.0) [17] based on default parameters to predict the number of clusters that were possibly not resolved by PRISM. Guided by both PRISM and antiSMASH prediction, genes from both BGC analysis tools, also featuring those identified from PGAP annotation, were manually selected and reanalyzed with BLASTp (NCBI ( https://blast.ncbi.nlm.nih.gov/Blastp ) and UniProt ( https://www.uniprot.org/blast )) in order to find more about their relevance to their predicted biosynthetic gene clusters. Structures of the predicted compounds were then drawn using Chemdraw (v8.0) [6] and the pathways were manually curated based on functional annotations of their respective catalytic proteins.

Ethics Statements
This project did not involve human subjects, animals, cell lines or endangered species. The current manuscript is our original work, which has not been previously published elsewhere.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.